Estimation in exponential family regression based on linked data contaminated by mismatch error
نویسندگان
چکیده
Identification of matching records in multiple files can be a challenging and error-prone task. Linkage error considerably affect subsequent statistical analysis based on the resulting linked file. Several recent papers have studied post-linkage linear regression with response variable one file covariates second from perspective "Broken Sample Problem" "Permuted Data". In this paper, we present an extension line research to exponential family given assumption small moderate number mismatches. A method observation-specific offsets account for potential mismatches $\ell_1$-penalization is proposed, its properties are discussed. We also sufficient conditions recovery correct correspondence between responses if parameter known. The proposed approach compared established baselines, namely methods by Lahiri-Larsen Chambers, both theoretically empirically synthetic real data. results indicate that substantial improvements over those achieved even only limited information about linkage process available.
منابع مشابه
Bayesian and Iterative Maximum Likelihood Estimation of the Coefficients in Logistic Regression Analysis with Linked Data
This paper considers logistic regression analysis with linked data. It is shown that, in logistic regression analysis with linked data, a finite mixture of Bernoulli distributions can be used for modeling the response variables. We proposed an iterative maximum likelihood estimator for the regression coefficients that takes the matching probabilities into account. Next, the Bayesian counterpart...
متن کاملsequential estimation in a subclass of exponential family under weighted squared error loss
in a subclass of the scale-parameter exponential family, we consider the sequential pointestimation of a function of the scale parameter under the loss function given as the sum of the weightedsquared error loss and a linear cost. for a fully sequential sampling scheme, second order expansions areobtained for the expected sample size as well as for the regret of the procedure. the former resear...
متن کاملError Bounds in Parameter Estimation Under Mismatch
In this paper we develop a new upper bound for the mean square estimation error of a parameter that takes values on a bounded interval. The bound is based on the discretization of the region into a finite number of points, and the determination of the estimate by a maximum likelihood procedure. It is assumed that inaccurate versions of the true spectra are utilized in the implementation of the ...
متن کاملEstimation in Functional Regression for General Exponential Families by Winston
This paper studies a class of exponential family models whose canonical parameters are specified as linear functionals of an unknown infinitedimensional slope function. The optimal minimax rates of convergence for slope function estimation are established. The estimators that achieve the optimal rates are constructed by constrained maximum likelihood estimation with parameters whose dimension g...
متن کاملData reconciliation and gross error diagnosis based on regression
In this article we show that the linear reconciliation problem can be represented by a standard multiple linear regression model. The appropriate criteria for redundancy, determinability and gross error detection are shown to follow in a straightforward manner from the standard theory of linear least squares. The regression approach suggests a natural measure of the redundancy of an observation...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Statistics and Its Interface
سال: 2023
ISSN: ['1938-7989', '1938-7997']
DOI: https://doi.org/10.4310/22-sii726